11 research outputs found
FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA
Autoregressive convolutional neural networks (CNNs) have been widely
exploited for sequence generation tasks such as audio synthesis, language
modeling and neural machine translation. WaveNet is a deep autoregressive CNN
composed of several stacked layers of dilated convolution that is used for
sequence generation. While WaveNet produces state-of-the art audio generation
results, the naive inference implementation is quite slow; it takes a few
minutes to generate just one second of audio on a high-end GPU. In this work,
we develop the first accelerator platform~\textit{FastWave} for autoregressive
convolutional neural networks, and address the associated design challenges. We
design the Fast-Wavenet inference model in Vivado HLS and perform a wide range
of optimizations including fixed-point implementation, array partitioning and
pipelining. Our model uses a fully parameterized parallel architecture for fast
matrix-vector multiplication that enables per-layer customized latency
fine-tuning for further throughput improvement. Our experiments comparatively
assess the trade-off between throughput and resource utilization for various
optimizations. Our best WaveNet design on the Xilinx XCVU13P FPGA that uses
only on-chip memory, achieves 66 faster generation speed compared to CPU
implementation and 11 faster generation speed than GPU implementation.Comment: Published as a conference paper at ICCAD 201
zPROBE: Zero Peek Robustness Checks for Federated Learning
Privacy-preserving federated learning allows multiple users to jointly train
a model with coordination of a central server. The server only learns the final
aggregation result, thus the users' (private) training data is not leaked from
the individual model updates. However, keeping the individual updates private
allows malicious users to perform Byzantine attacks and degrade the accuracy
without being detected. Best existing defenses against Byzantine workers rely
on robust rank-based statistics, e.g., median, to find malicious updates.
However, implementing privacy-preserving rank-based statistics is nontrivial
and not scalable in the secure domain, as it requires sorting all individual
updates. We establish the first private robustness check that uses high break
point rank-based statistics on aggregated model updates. By exploiting
randomized clustering, we significantly improve the scalability of our defense
without compromising privacy. We leverage our statistical bounds in
zero-knowledge proofs to detect and remove malicious updates without revealing
the private user updates. Our novel framework, zPROBE, enables Byzantine
resilient and secure federated learning. Empirical evaluations demonstrate that
zPROBE provides a low overhead solution to defend against state-of-the-art
Byzantine attacks while preserving privacy.Comment: ICCV 202
MPCircuits: Optimized Circuit Generation for Secure Multi-Party Computation
Secure Multi-party Computation (MPC) is one of the most influential achievements of modern cryptography: it allows evaluation of an arbitrary function on private inputs from multiple parties without revealing the inputs. A crucial step of utilizing contemporary MPC protocols is to describe the function as a Boolean circuit. While efficient solutions have been proposed for special case of two-party secure computation, the general case of more than two-party is not addressed. This paper proposes MPCircuits, the first automated solution to devise the optimized Boolean circuit representation for any MPC function using hardware synthesis tools with new customized libraries that are scalable to multiple parties. MPCircuits creates a new end-to-end tool-chain to facilitate practical scalable MPC realization. To illustrate the practicality of MPCircuits, we design and implement a set of five circuits that represent real-world MPC problems. Our benchmarks inherently have different computational and communication complexities and are good candidates to evaluate MPC protocols. We also formalize the metrics by which a given protocol can be analyzed. We provide extensive experimental evaluations for these benchmarks; two of which are the first reported solutions in multi-party settings. As our experimental results indicate, MPCircuits reduces the computation time of MPC protocols by up to 4.2x
Recommended from our members
Holistic Algorithm and System Co-Optimization for Trustworthy and Platform-Aware Deep Learning
Simultaneous growth in the volume of available data along with rapid advancements in computing and hardware technology have paved the way for unprecedented breakthroughs in the field of Artificial Intelligence (AI). In particular, a modern class of AI algorithms, dubbed Deep Learning (DL), has shown great promise by achieving or even surpassing human-level capabilities in many tasks. The rise of DL has brought forth a new industrial revolution by taking over the modern landscape of smart applications, e.g., self-driving cars, virtual assistants, drug discovery, and manufacturing. Nevertheless, to date, there exist quite a few challenges for wide-scale adoption of DL in real-life scenarios.Firstly, confidence characterization and ensuring robustness of DL-enabled services is imperative, particularly in safety-critical autonomous systems. Secondly, concerns over the scalability and efficiency of DL hinder its training and deployment on diverse hardware platforms. This dissertation addresses the above-mentioned challenges via a holistic customization of DL algorithm and system from the standpoint of task-based metrics (e.g., accuracy), physical constraints (e.g., memory and power budget), as well as new design metrics that facilitate DL integration in safety-sensitive tasks. The presented research in this dissertation interlinks theoretical fundamentals, domain-specific architecture design, and automated tools that enable co-optimization of the DL algorithm with the underlying platform while satisfying various constraints. The key contributions of this dissertation are as follows:
1) Devising CuRTAIL, the first end-to-end and automated framework that simultaneously enables efficient and safe execution of DL models in face of adversarial attacks. CuRTAIL formalizes the goal of thwarting adversarial attacks as an optimization problem and trains parallel defense modules to minimize vulnerability. The proposed framework leverages hardware/algorithm co-design and customized acceleration to enable just-in-time execution in resource-constrained settings.
2) Designing a novel framework, dubbed ACCHASHTAG, which identifies any faults occurring during DL inference in real time. I propose to summarize the ground-truth DL model as a unique hash signature, which is used to verify the model’s integrity on the fly. Notably, ACCHASHTAG, for the first time, provides guaranteed lower bounds on the detection rate using a formal statistical analysis of hash collision.
3) Proposing CLEANN, the first end-to-end framework that enables online mitigation of backdoor, a.k.a. Trojan, attacks on DL. CLEANN uses sparse recovery and statistical analysis to identify incoming Trojan samples and remove their effect on the victim model’s prediction. I design the algorithmic solutions as well as customized hardware-accelerated engines to enable real-time DL model decision verification via CLEANN.
4) Innovating an approach for restructuring inter-layer connections in DL models, leading to faster convergence to a desired accuracy during training. This is achieved by transforming the DL model into a small-world network using principles from graph theory. The obtained DL model, dubbed SWANN, is a highly-connected, small-world topology with enhanced signal propagation characteristics and faster learning speed.
5) Developing LTS, the first training-free, hardware-aware neural architecture search for autoregressive Transformers. The proposed method delivers high-performance specialized architectures for inference on a target hardware. The core of LTS is an ultra-low-cost proxy that can estimate the performance of candidate architectures without any need for training. Using this novel proxy, the search can be performed entirely on the target hardware, allowing us to incorporate hardware measurements, e.g., peak memory utilization and latency, within the architecture search loop.
6) Automating DL model customization for various target hardware by formulating it as a constrained optimization. The optimization goal is to compress a large model to satisfy given accuracy and hardware performance constraints. I propose a highly-scalable blackbox optimizer, dubbed AdaNS, to solve the aforesaid optimization problem. AdaNS leverages adaptive non-uniform sampling with carefully crafted probabilistic distributions to locate and reconstruct the optimization objective function around its maximizers
Machine learning-assisted E-jet printing of organic flexible electronics
Electrohydrodynamic-jet (e-jet) printing technique enables the high-resolution printing of complex soft electronic devices. As such, it has an unmatched potential for becoming the conventional technique for printing soft electronic devices. In this study, the electrical conductivity of the e-jet printed circuits was studied as a function of key printing parameters (nozzle speed, ink flow rate, and voltage). The collected experimental dataset was then used to train a machine learning algorithm to establish models capable of predicting the characteristics of the printed circuits in real-time. Decision tree was applied on the data set and resulted in the accuracy of 0.72 and further evaluations showed that pruning the tree increased the accuracy while sensitivity decreased in the highly pruned trees. The k-fold cross validation (CV) method was used in model selection to test the ability of model to get trained on data. The accuracy of CV method was the highest for random forest at 0.83 and K-NN model (k = 10) at 0.82. Precision parameters were compared to evaluate the supervised classification models. According to F-measure values, the K-NN model (k = 10) and random forest are the best methods to classify the conductivity of electrodes.This is a manuscript of an article published as Shirsavar, Mehran Abbasi, Mehrnoosh Taghavimehr, Lionel J. Ouedraogo, Mojan Javaheripi, Nicole N. Hashemi, Farinaz Koushanfar, and Reza Montazami. "Machine learning-assisted E-jet printing of organic flexible electronics." Biosensors and Bioelectronics (2022): 114418.
DOI: 10.1016/j.bios.2022.114418.
Copyright 2022 Elsevier B.V.
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0).
Posted with permission